Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
J Am Med Inform Assoc ; 31(5): 1144-1150, 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38447593

RESUMO

OBJECTIVE: To evaluate the real-world performance of the SMART/HL7 Bulk Fast Health Interoperability Resources (FHIR) Access Application Programming Interface (API), developed to enable push button access to electronic health record data on large populations, and required under the 21st Century Cures Act Rule. MATERIALS AND METHODS: We used an open-source Bulk FHIR Testing Suite at 5 healthcare sites from April to September 2023, including 4 hospitals using electronic health records (EHRs) certified for interoperability, and 1 Health Information Exchange (HIE) using a custom, standards-compliant API build. We measured export speeds, data sizes, and completeness across 6 types of FHIR. RESULTS: Among the certified platforms, Oracle Cerner led in speed, managing 5-16 million resources at over 8000 resources/min. Three Epic sites exported a FHIR data subset, achieving 1-12 million resources at 1555-2500 resources/min. Notably, the HIE's custom API outperformed, generating over 141 million resources at 12 000 resources/min. DISCUSSION: The HIE's custom API showcased superior performance, endorsing the effectiveness of SMART/HL7 Bulk FHIR in enabling large-scale data exchange while underlining the need for optimization in existing EHR platforms. Agility and scalability are essential for diverse health, research, and public health use cases. CONCLUSION: To fully realize the interoperability goals of the 21st Century Cures Act, addressing the performance limitations of Bulk FHIR API is critical. It would be beneficial to include performance metrics in both certification and reporting processes.


Assuntos
Troca de Informação em Saúde , Nível Sete de Saúde , Software , Registros Eletrônicos de Saúde , Atenção à Saúde
2.
medRxiv ; 2024 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-38370642

RESUMO

Objective: To address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app 'listener' that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API). Methods: We advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simplicity as well as privacy preservation during robust data sharing, and AI for processing unstructured text. Results: Cumulus relies on containerized, cloud-hosted software, installed within a healthcare organization's security envelope. Cumulus accesses EHR data via the Bulk FHIR interface and streamlines automated processing and sharing. The modular design enables use of the latest AI and natural language processing tools and supports provider autonomy and administrative simplicity. In an initial test, Cumulus was deployed across five healthcare systems each partnered with public health. Cumulus output is patient counts which were aggregated into a table stratifying variables of interest to enable population health studies. All code is available open source. A policy stipulating that only aggregate data leave the institution greatly facilitated data sharing agreements. Discussion and Conclusion: Cumulus addresses barriers to data sharing based on (1) federally required support for standard APIs (2), increasing use of cloud computing, and (3) advances in AI. There is potential for scalability to support learning across myriad network configurations and use cases.

3.
medRxiv ; 2023 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-37873390

RESUMO

Objective: To evaluate the real-world performance in delivering patient data on populations, of the SMART/HL7 Bulk FHIR Access API, required in Electronic Health Records (EHRs) under the 21st Century Cures Act Rule. Materials and Methods: We used an open-source Bulk FHIR Testing Suite at five healthcare sites from April to September 2023, including four hospitals using EHRs certified for interoperability, and one Health Information Exchange (HIE) using a custom, standards-compliant API build. We measured export speeds, data sizes, and completeness across six types of FHIR resources. Results: Among the certified platforms, Oracle Cerner led in speed, managing 5-16 million resources at over 8,000 resources/min. Three Epic sites exported a FHIR data subset, achieving 1-12 million resources at 1,555-2,500 resources/min. Notably, the HIE's custom API outperformed, generating over 141 million resources at 12,000 resources/min. Discussion: The HIE's custom API showcased superior performance, endorsing the effectiveness of SMART/HL7 Bulk FHIR in enabling large-scale data exchange while underlining the need for optimization in existing EHR platforms. Agility and scalability are essential for diverse health, research, and public health use cases. Conclusion: To fully realize the interoperability goals of the 21st Century Cures Act, addressing the performance limitations of Bulk FHIR API is critical. It would be beneficial to include performance metrics in both certification and reporting processes.

4.
Learn Health Syst ; 6(2): e10309, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35434359

RESUMO

The growing availability of multi-scale biomedical data sources that can be used to enable research and improve healthcare delivery has brought about what can be described as a healthcare "data age." This new era is defined by the explosive growth in bio-molecular, clinical, and population-level data that can be readily accessed by researchers, clinicians, and decision-makers, and utilized for systems-level approaches to hypothesis generation and testing as well as operational decision-making. However, taking full advantage of these unprecedented opportunities presents an opportunity to revisit the alignment between traditionally academic biomedical informatics (BMI) and operational healthcare information technology (HIT) personnel and activities in academic health systems. While the history of the academic field of BMI includes active engagement in the delivery of operational HIT platforms, in many contemporary settings these efforts have grown distinct. Recent experiences during the COVID-19 pandemic have demonstrated greater coordination of BMI and HIT activities that have allowed organizations to respond to pandemic-related changes more effectively, with demonstrable and positive impact as a result. In this position paper, we discuss the challenges and opportunities associated with driving alignment between BMI and HIT, as viewed from the perspective of a learning healthcare system. In doing so, we hope to illustrate the benefits of coordination between BMI and HIT in terms of the quality, safety, and outcomes of care provided to patients and populations, demonstrating that these two groups can be "better together."

5.
Clin Epidemiol ; 14: 369-384, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35345821

RESUMO

Purpose: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) Characterizing Health Associated Risks and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation and analysis of COVID-19 RWD. Patients and Methods: We conducted a descriptive retrospective database study using a federated network of data partners in the United States, Europe (the Netherlands, Spain, the UK, Germany, France and Italy) and Asia (South Korea and China). The study protocol and analytical package were released on 11th June 2020 and are iteratively updated via GitHub. We identified three non-mutually exclusive cohorts of 4,537,153 individuals with a clinical COVID-19 diagnosis or positive test, 886,193 hospitalized with COVID-19, and 113,627 hospitalized with COVID-19 requiring intensive services. Results: We aggregated over 22,000 unique characteristics describing patients with COVID-19. All comorbidities, symptoms, medications, and outcomes are described by cohort in aggregate counts and are readily available online. Globally, we observed similarities in the USA and Europe: more women diagnosed than men but more men hospitalized than women, most diagnosed cases between 25 and 60 years of age versus most hospitalized cases between 60 and 80 years of age. South Korea differed with more women than men hospitalized. Common comorbidities included type 2 diabetes, hypertension, chronic kidney disease and heart disease. Common presenting symptoms were dyspnea, cough and fever. Symptom data availability was more common in hospitalized cohorts than diagnosed. Conclusion: We constructed a global, multi-centre view to describe trends in COVID-19 progression, management and evolution over time. By characterising baseline variability in patients and geography, our work provides critical context that may otherwise be misconstrued as data quality issues. This is important as we perform studies on adverse events of special interest in COVID-19 vaccine surveillance.

6.
J Am Med Inform Assoc ; 29(8): 1350-1365, 2022 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-35357487

RESUMO

OBJECTIVE: This study sought to evaluate whether synthetic data derived from a national coronavirus disease 2019 (COVID-19) dataset could be used for geospatial and temporal epidemic analyses. MATERIALS AND METHODS: Using an original dataset (n = 1 854 968 severe acute respiratory syndrome coronavirus 2 tests) and its synthetic derivative, we compared key indicators of COVID-19 community spread through analysis of aggregate and zip code-level epidemic curves, patient characteristics and outcomes, distribution of tests by zip code, and indicator counts stratified by month and zip code. Similarity between the data was statistically and qualitatively evaluated. RESULTS: In general, synthetic data closely matched original data for epidemic curves, patient characteristics, and outcomes. Synthetic data suppressed labels of zip codes with few total tests (mean = 2.9 ± 2.4; max = 16 tests; 66% reduction of unique zip codes). Epidemic curves and monthly indicator counts were similar between synthetic and original data in a random sample of the most tested (top 1%; n = 171) and for all unsuppressed zip codes (n = 5819), respectively. In small sample sizes, synthetic data utility was notably decreased. DISCUSSION: Analyses on the population-level and of densely tested zip codes (which contained most of the data) were similar between original and synthetically derived datasets. Analyses of sparsely tested populations were less similar and had more data suppression. CONCLUSION: In general, synthetic data were successfully used to analyze geospatial and temporal trends. Analyses using small sample sizes or populations were limited, in part due to purposeful data label suppression-an attribute disclosure countermeasure. Users should consider data fitness for use in these cases.


Assuntos
COVID-19 , SARS-CoV-2 , Estudos de Coortes , Humanos , Estados Unidos/epidemiologia
7.
Learn Health Syst ; 6(1): e10293, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35036557

RESUMO

Development of evidence-based practice requires practice-based evidence, which can be acquired through analysis of real-world data from electronic health records (EHRs). The EHR contains volumes of information about patients-physical measurements, diagnoses, exposures, and markers of health behavior-that can be used to create algorithms for risk stratification or to gain insight into associations between exposures, interventions, and outcomes. But to transform real-world data into reliable real-world evidence, one must not only choose the correct analytical methods but also have an understanding of the quality, detail, provenance, and organization of the underlying source data and address the differences in these characteristics across sites when conducting analyses that span institutions. This manuscript explores the idiosyncrasies inherent in the capture, formatting, and standardization of EHR data and discusses the clinical domain and informatics competencies required to transform the raw clinical, real-world data into high-quality, fit-for-purpose analytical data sets used to generate real-world evidence.

8.
medRxiv ; 2021 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-34268525

RESUMO

OBJECTIVE: To evaluate whether synthetic data derived from a national COVID-19 data set could be used for geospatial and temporal epidemic analyses. MATERIALS AND METHODS: Using an original data set (n=1,854,968 SARS-CoV-2 tests) and its synthetic derivative, we compared key indicators of COVID-19 community spread through analysis of aggregate and zip-code level epidemic curves, patient characteristics and outcomes, distribution of tests by zip code, and indicator counts stratified by month and zip code. Similarity between the data was statistically and qualitatively evaluated. RESULTS: In general, synthetic data closely matched original data for epidemic curves, patient characteristics, and outcomes. Synthetic data suppressed labels of zip codes with few total tests (mean=2.9±2.4; max=16 tests; 66% reduction of unique zip codes). Epidemic curves and monthly indicator counts were similar between synthetic and original data in a random sample of the most tested (top 1%; n=171) and for all unsuppressed zip codes (n=5,819), respectively. In small sample sizes, synthetic data utility was notably decreased. DISCUSSION: Analyses on the population-level and of densely-tested zip codes (which contained most of the data) were similar between original and synthetically-derived data sets. Analyses of sparsely-tested populations were less similar and had more data suppression. CONCLUSION: In general, synthetic data were successfully used to analyze geospatial and temporal trends. Analyses using small sample sizes or populations were limited, in part due to purposeful data label suppression -an attribute disclosure countermeasure. Users should consider data fitness for use in these cases.

9.
J Med Internet Res ; 23(4): e22796, 2021 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-33861206

RESUMO

BACKGROUND: Asthma affects a large proportion of the population and leads to many hospital encounters involving both hospitalizations and emergency department visits every year. To lower the number of such encounters, many health care systems and health plans deploy predictive models to prospectively identify patients at high risk and offer them care management services for preventive care. However, the previous models do not have sufficient accuracy for serving this purpose well. Embracing the modeling strategy of examining many candidate features, we built a new machine learning model to forecast future asthma hospital encounters of patients with asthma at Intermountain Healthcare, a nonacademic health care system. This model is more accurate than the previously published models. However, it is unclear how well our modeling strategy generalizes to academic health care systems, whose patient composition differs from that of Intermountain Healthcare. OBJECTIVE: This study aims to evaluate the generalizability of our modeling strategy to the University of Washington Medicine (UWM), an academic health care system. METHODS: All adult patients with asthma who visited UWM facilities between 2011 and 2018 served as the patient cohort. We considered 234 candidate features. Through a secondary analysis of 82,888 UWM data instances from 2011 to 2018, we built a machine learning model to forecast asthma hospital encounters of patients with asthma in the subsequent 12 months. RESULTS: Our UWM model yielded an area under the receiver operating characteristic curve (AUC) of 0.902. When placing the cutoff point for making binary classification at the top 10% (1464/14,644) of patients with asthma with the largest forecasted risk, our UWM model yielded an accuracy of 90.6% (13,268/14,644), a sensitivity of 70.2% (153/218), and a specificity of 90.91% (13,115/14,426). CONCLUSIONS: Our modeling strategy showed excellent generalizability to the UWM, leading to a model with an AUC that is higher than all of the AUCs previously reported in the literature for forecasting asthma hospital encounters. After further optimization, our model could be used to facilitate the efficient and effective allocation of asthma care management resources to improve outcomes. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.2196/resprot.5039.


Assuntos
Asma , Adulto , Asma/epidemiologia , Asma/terapia , Atenção à Saúde , Previsões , Hospitais , Humanos , Estudos Retrospectivos
10.
J Am Med Inform Assoc ; 28(3): 427-443, 2021 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-32805036

RESUMO

OBJECTIVE: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. MATERIALS AND METHODS: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. RESULTS: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. CONCLUSIONS: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.


Assuntos
COVID-19 , Ciência de Dados/organização & administração , Disseminação de Informação , Colaboração Intersetorial , Segurança Computacional , Análise de Dados , Comitês de Ética em Pesquisa , Regulamentação Governamental , Humanos , National Institutes of Health (U.S.) , Estados Unidos
11.
J Am Med Inform Assoc ; 28(2): 393-401, 2021 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-33260207

RESUMO

Our goal is to summarize the collective experience of 15 organizations in dealing with uncoordinated efforts that result in unnecessary delays in understanding, predicting, preparing for, containing, and mitigating the COVID-19 pandemic in the US. Response efforts involve the collection and analysis of data corresponding to healthcare organizations, public health departments, socioeconomic indicators, as well as additional signals collected directly from individuals and communities. We focused on electronic health record (EHR) data, since EHRs can be leveraged and scaled to improve clinical care, research, and to inform public health decision-making. We outline the current challenges in the data ecosystem and the technology infrastructure that are relevant to COVID-19, as witnessed in our 15 institutions. The infrastructure includes registries and clinical data networks to support population-level analyses. We propose a specific set of strategic next steps to increase interoperability, overall organization, and efficiencies.


Assuntos
COVID-19 , Registros Eletrônicos de Saúde , Disseminação de Informação , Sistemas de Informação/organização & administração , Prática de Saúde Pública , Centros Médicos Acadêmicos , Humanos , Sistema de Registros , Estados Unidos
12.
medRxiv ; 2020 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-33140068

RESUMO

Early identification of symptoms and comorbidities most predictive of COVID-19 is critical to identify infection, guide policies to effectively contain the pandemic, and improve health systems' response. Here, we characterised socio-demographics and comorbidity in 3,316,107persons tested and 219,072 persons tested positive for SARS-CoV-2 since January 2020, and their key health outcomes in the month following the first positive test. Routine care data from primary care electronic health records (EHR) from Spain, hospital EHR from the United States (US), and claims data from South Korea and the US were used. The majority of study participants were women aged 18-65 years old. Positive/tested ratio varied greatly geographically (2.2:100 to 31.2:100) and over time (from 50:100 in February-April to 6.8:100 in May-June). Fever, cough and dyspnoea were the most common symptoms at presentation. Between 4%-38% required admission and 1-10.5% died within a month from their first positive test. Observed disparity in testing practices led to variable baseline characteristics and outcomes, both nationally (US) and internationally. Our findings highlight the importance of large scale characterization of COVID-19 international cohorts to inform planning and resource allocation including testing as countries face a second wave.

13.
Appl Clin Inform ; 11(3): 387-398, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32462640

RESUMO

BACKGROUND: Early detection and efficient management of sepsis are important for improving health care quality, effectiveness, and costs. Due to its high cost and prevalence, sepsis is a major focus area across institutions and many studies have emerged over the past years with different models or novel machine learning techniques in early detection of sepsis or potential mortality associated with sepsis. OBJECTIVE: To understand predictive analytics solutions for sepsis patients, either in early detection of onset or mortality. METHODS AND RESULTS: We performed a systematized narrative review and identified common and unique characteristics between their approaches and results in studies that used predictive analytics solutions for sepsis patients. After reviewing 148 retrieved papers, a total of 31 qualifying papers were analyzed with variances in model, including linear regression (n = 2), logistic regression (n = 5), support vector machines (n = 4), and Markov models (n = 4), as well as population (range: 24-198,833) and feature size (range: 2-285). Many of the studies used local data sets of varying sizes and locations while others used the publicly available Medical Information Mart for Intensive Care data. Additionally, vital signs or laboratory test results were commonly used as features for training and testing purposes; however, a few used more unique features including gene expression data from blood plasma and unstructured text and data from clinician notes. CONCLUSION: Overall, we found variation in the domain of predictive analytics tools for septic patients, from feature and population size to choice of method or algorithm. There are still limitations in transferability and generalizability of the algorithms or methods used. However, it is evident that implementing predictive analytics tools are beneficial in the early detection of sepsis or death related to sepsis. Since most of these studies were retrospective, the translational value in the real-world setting in different wards should be further investigated.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Sepse , Humanos , Aprendizado de Máquina , Sepse/diagnóstico , Sepse/mortalidade , Sepse/terapia
14.
J Am Med Inform Assoc ; 27(1): 109-118, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31592524

RESUMO

OBJECTIVE: Academic medical centers and health systems are increasingly challenged with supporting appropriate secondary use of clinical data. Enterprise data warehouses have emerged as central resources for these data, but often require an informatician to extract meaningful information, limiting direct access by end users. To overcome this challenge, we have developed Leaf, a lightweight self-service web application for querying clinical data from heterogeneous data models and sources. MATERIALS AND METHODS: Leaf utilizes a flexible biomedical concept system to define hierarchical concepts and ontologies. Each Leaf concept contains both textual representations and SQL query building blocks, exposed by a simple drag-and-drop user interface. Leaf generates abstract syntax trees which are compiled into dynamic SQL queries. RESULTS: Leaf is a successful production-supported tool at the University of Washington, which hosts a central Leaf instance querying an enterprise data warehouse with over 300 active users. Through the support of UW Medicine (https://uwmedicine.org), the Institute of Translational Health Sciences (https://www.iths.org), and the National Center for Data to Health (https://ctsa.ncats.nih.gov/cd2h/), Leaf source code has been released into the public domain at https://github.com/uwrit/leaf. DISCUSSION: Leaf allows the querying of single or multiple clinical databases simultaneously, even those of different data models. This enables fast installation without costly extraction or duplication. CONCLUSIONS: Leaf differs from existing cohort discovery tools because it does not specify a required data model and is designed to seamlessly leverage existing user authentication systems and clinical databases in situ. We believe Leaf to be useful for health system analytics, clinical research data warehouses, precision medicine biobanks, and clinical studies involving large patient cohorts.


Assuntos
Data Warehousing , Armazenamento e Recuperação da Informação/métodos , Pesquisa Translacional Biomédica , Interface Usuário-Computador , Vocabulário Controlado , Bases de Dados como Assunto , Humanos , Internet , Unified Medical Language System
15.
JMIR Med Inform ; 6(4): e12241, 2018 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-30401670

RESUMO

BACKGROUND: In the United States, health care is fragmented in numerous distinct health care systems including private, public, and federal organizations like private physician groups and academic medical centers. Many patients have their complete medical data scattered across these several health care systems, with no particular system having complete data on any of them. Several major data analysis tasks such as predictive modeling using historical data are considered impractical on incomplete data. OBJECTIVE: Our objective was to find a way to enable these analysis tasks for a health care system with incomplete data on many of its patients. METHODS: This study presents, to the best of our knowledge, the first method to use a geographic constraint to identify a reasonably large subset of patients who tend to receive most of their care from a given health care system. A data analysis task needing relatively complete data can be conducted on this subset of patients. We demonstrated our method using data from the University of Washington Medicine (UWM) and PreManage data covering the use of all hospitals in Washington State. We compared 10 candidate constraints to optimize the solution. RESULTS: For UWM, the best constraint is that the patient has a UWM primary care physician and lives within 5 miles of at least one UWM hospital. About 16.01% (55,707/348,054) of UWM patients satisfied this constraint. Around 69.38% (10,501/15,135) of their inpatient stays and emergency department visits occurred within UWM in the following 6 months, more than double the corresponding percentage for all UWM patients. CONCLUSIONS: Our method can identify a reasonably large subset of patients who tend to receive most of their care from UWM. This enables several major analysis tasks on incomplete medical data that were previously deemed infeasible.

16.
JMIR Res Protoc ; 6(8): e175, 2017 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-28851678

RESUMO

BACKGROUND: To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, health care researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Health care researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a shortage in the United States of data scientists and hiring competition from companies with deep pockets, health care systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select the following: (1) hyper-parameter values and complex algorithms that greatly affect model accuracy and (2) operators and periods for temporally aggregating clinical attributes (eg, whether a patient's weight kept rising in the past year). This process becomes infeasible with limited budgets. OBJECTIVE: This study's goal is to enable health care researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data. METHODS: This study will allow us to achieve the following: (1) finish developing the new software, Automated Machine Learning (Auto-ML), to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance; (2) apply Auto-ML and novel methodology to two new modeling problems crucial for care management allocation and pilot one model with care managers; and (3) perform simulations to estimate the impact of adopting Auto-ML on US patient outcomes. RESULTS: We are currently writing Auto-ML's design document. We intend to finish our study by around the year 2022. CONCLUSIONS: Auto-ML will generalize to various clinical prediction/classification problems. With minimal help from data scientists, health care researchers can use Auto-ML to quickly build high-quality models. This will boost wider use of machine learning in health care and improve patient outcomes.

17.
AMIA Annu Symp Proc ; 2016: 381-390, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-28269833

RESUMO

Clinical data warehouses, initially directed towards clinical research or financial analyses, are evolving to support quality improvement efforts, and must now address the quality improvement life cycle. In addition, data that are needed for quality improvement often do not reside in a single database, requiring easier methods to query data across multiple disparate sources. We created a virtual data warehouse at NewYork Presbyterian Hospital that allowed us to bring together data from several source systems throughout the organization. We also created a framework to match the maturity of a data request in the quality improvement life cycle to proper tools needed for each request. As projects progress in the Define, Measure, Analyze, Improve, Control stages of quality improvement, there is a proper matching of resources the data needs at each step. We describe the analysis and design creating a robust model for applying clinical data warehousing to quality improvement.


Assuntos
Bases de Dados como Assunto/organização & administração , Sistemas de Informação Hospitalar , Hospitais Universitários/organização & administração , Melhoria de Qualidade , Sistemas de Gerenciamento de Base de Dados , Sistemas Computadorizados de Registros Médicos , Cidade de Nova Iorque , Integração de Sistemas
18.
J Am Med Inform Assoc ; 21(2): 204-11, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24169275

RESUMO

Large amounts of personal health data are being collected and made available through existing and emerging technological media and tools. While use of these data has significant potential to facilitate research, improve quality of care for individuals and populations, and reduce healthcare costs, many policy-related issues must be addressed before their full value can be realized. These include the need for widely agreed-on data stewardship principles and effective approaches to reduce or eliminate data silos and protect patient privacy. AMIA's 2012 Health Policy Meeting brought together healthcare academics, policy makers, and system stakeholders (including representatives of patient groups) to consider these topics and formulate recommendations. A review of a set of Proposed Principles of Health Data Use led to a set of findings and recommendations, including the assertions that the use of health data should be viewed as a public good and that achieving the broad benefits of this use will require understanding and support from patients.


Assuntos
Registros Eletrônicos de Saúde/normas , Política de Saúde , Confidencialidade/normas , Humanos , Disseminação de Informação , Política Organizacional , Acesso dos Pacientes aos Registros , Participação do Paciente , Sociedades Médicas , Estados Unidos
19.
AMIA Annu Symp Proc ; 2014: 934-43, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25954401

RESUMO

Intermountain Healthcare's Mental Health Integration (MHI) Care Process Model (CPM) contains formal scoring criteria for assessing a patient's mental health complexity as "mild," "medium," or "high" based on patient data. The complexity score attempts to assist Primary Care Physicians in assessing the mental health needs of their patients and what resources will need to be brought to bear. We describe an effort to computerize the scoring. Informatics and MHI personnel collaboratively and iteratively refined the criteria to make them adequately explicit and reflective of MHI objectives. When tested on retrospective data of 540 patients, the clinician agreed with the computer's conclusion in 52.8% of the cases (285/540). We considered the analysis sufficiently successful to begin piloting the computerized score in prospective clinical care. So far in the pilot, clinicians have agreed with the computer in 70.6% of the cases (24/34).


Assuntos
Algoritmos , Prestação Integrada de Cuidados de Saúde , Transtornos Mentais/classificação , Saúde Mental/classificação , Humanos , Projetos Piloto , Atenção Primária à Saúde , Estudos Retrospectivos , Utah
20.
AMIA Annu Symp Proc ; 2014: 1738-47, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25954446

RESUMO

Electronic health records (EHRs) have been used as a valuable data source for phenotyping. However, this method suffers from inherent data quality issues like data missingness. As patient self-reported health data are increasingly available, it is useful to know how the two data sources compare with each other for phenotyping. This study addresses this research question. We used self-reported diabetes status for 2,249 patients treated at Columbia University Medical Center and the well-known eMERGE EHR phenotyping algorithm for Type 2 diabetes mellitus (DM2) to conduct the experiment. The eMERGE algorithm achieved high specificity (.97) but low sensitivity (.32) among this patient cohort. About 87% of the patients with self-reported diabetes had at least one ICD-9 code, one medication, or one lab result supporting a DM2 diagnosis, implying the remaining 13% may have missing or incorrect self-reports. We discuss the tradeoffs in both data sources and in combining them for phenotyping.


Assuntos
Algoritmos , Diabetes Mellitus , Registros Eletrônicos de Saúde , Autorrelato , Adulto , Feminino , Humanos , Masculino , Cidade de Nova Iorque , Fenótipo , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA